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ABSTRACT 

This  work  examined  minimax  linear  estimation  in  multiple 
linear  regression.  The  application  of  minimax  estimation  to 
regression  led  to  the  development  of  ridge  regression  estimators 
with  stochastic  ridge  parameters.  These  estimators  were  seen 
to  be  invariant  under  linear  transformation;  a  property  which 
has  not  been  established  for  other  ridge  estimators.  These 
minimax-motivated  estimators  were  examined  in  several  simulation 
studies.  In  particular,  flaws  in  other  simulation  studies  of 
ridge  estimators  were  depicted.  Consequently,  an  improved 
simulation  procedure  was  used.  It  was  observed  from  these 
studies  that,  contrary  to  published  statements,  a  ridge  estimator 
can  be  considerably  superior  to  the  ordinary  least  squares 
estimator,  especially  when  high  pairwise  correlations  exist 
among  the  regression  variables.  Robustness  considerations  were 
used  to  suggest  a  requirement  that  a  "good"  generalized  ridge 
regression  estimator  should  satisfy. 
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ABSTRACT 

This  paper  considers  minimax  linear  estimation  of  the  para¬ 
meters  in  a  multiple  linear  regression  model.  Recent  results 
are  summarized,  and  some  new  results,  including  a  transformation 
invariance  property  of  minimax  estimation,  are  given.  These 
minimax  estimators  of  the  parameter  vector  can  also  be  classified 
as  ridge  regression  estimators  with  nonstochastic  ridge  para¬ 
meters.  Some  ridge  regression  estimators  with  stochastic  ridge 
parameters  can  be  motivated  by  minimax  estimation  considerations. 
These  minimax-motivated  estimators  are  examined  in  several 
simulation  studies  and  some  observations  are  made  based  on  these 


simulations  and  minirnax  theory. 


1 


INTRODUCTION 


Thi 


ial  multi:,  le  linear  regression  model  is 


where  t  is  a  vector  ei  uncorrelated  random  variables  wit;:  mean 
zero  and  variance  o “ ,  and  X  is  a  full  rank  n  *  q  matrix  with 
q  <  >-•  The  usual  procedure  for  estimation  .5  is  the  least  squares 
method.  It  is  well  known  that  this  method  is  equivalent  to 
minimum  variance  unbiased  linear  (MV'JL)  estimation .  Similarly, 
t  he  usual  method  for  estimating  a  given  linear  combination  of 
the  coei f  ioi  _:;ts  is  y.Vb L .  in  recent  ars,  many  people  h_ve 
attempted  to  reduce  the  mean  squared  error  (MS If)  by  allowing 
some  bias  in  their  estimators .  One  such,  biased  estimation 


procedure  is  riogo  regression,  which  was  first  studied  by  iioer 
ana  i\ e n n a r u  L  •  -  •  i d >* e  r e . * r e s s i o n  estimators  li a \* e  t r e  i o it 


s*(<)  -  L.v.v  +  ki  y 


where  the  ridge  parameter  <  is  either  nonstochastie  (based  on 
prior  information)  or  stochastic .  It  loll  v-w  s  fro;.,  a  re.-u  •  \  i; 

/  j  I.  i.'lL  .  e  *  L  !.o!  .i  Iai  L  It'  .  i  L  1  :  1.  V  I'.  %  5 
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/ 1  ,  a  /.  /  /: 

A 

mitu:r.iaed  by  r  •=  ,3.  .  .  Tins  result  has  been  previously  observe 
in  17]  tor  t  he  case  .1  -  1  by  Lagrange  ir.ul  t  ipl  ler  methods .  See 
[16]  for  a  discussion  or  results  si:. .liar  to  Theorem.  3. 

One  car.  easily  er.eck  by  orthogonal  t  rar.stor;:;at  j  on  m.othous 
similar  to  tp.ose  or.  n jr  o„  or  l  /  j  mat  it 


g  A<) 


LP*  (k)  ]  'lb  (<) 


then  J-  is  a  decreasing  function  of  k.  Hence,  it  follows  eas: 
(by  supposing  not  uni  then  cent radict ing  Theorem  2)  that  if 


l :  j  :  '  .  - 
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3.  1. f,.* . ii i\  1  G, \Li 


Within  the  past  Live  years,  many  ridge  estimators  have  appoarec 
m  the  literature  are  have  been  compared  in  subsequent  simulation, 
studies.  Golub,  death,  and  Wahba  L5]  state  that  two  dozer,  is 
probably  a  conservative  estimate  of  the  numsor  of  such  estimators . 

Many  practitioners  have  used  the  ridge-trace  approach  to 
select  the  value  of  the  ridge  constant .  This  approach  is  highly 


subjective,  however ,  and  has  also  been  criticized  on  other  groom 
(see,  e .  j Smith  and  Campbell  [!-•]).  Consequently ,  it  seems  as 
though  more  formal  methods  for  selecting  the  ridge  constant 
need  to  be  employed.  The  ridge  estimator  presented  in.  Section 


•where  k .  is  stochastic  and  equal  to  s~/3's,  has  been  the  local 
point  in  our  simulation  studios .  Some  of  these  results  ere 
presented  later  in  this  section.  Wo  have  also  studied  the 
shrinkage  estimator  presentee  in  oxampie  1 ,  out  it  eid  not 
Lore  as  well  as  the  ridge  estimator. 

Before  un c e r  t a l n o  a  s i .mu i a 1 1 o n  s t ue y  of  t  e < j i  e s s  i o n  e s  f  l r*ta t o r s  , 
an  experimenter  must  select  an  appropriate  loss  function,  and 
choose  a  mechanism  for  generating  data  that  is.  representative 
of  a  e  L  a  a  1  a  a  t  a  .  fuv’  ices  t  ancf  ;  on  most  i  i  I'Oiri.l  ly  u.i  u  m> 
j’.a.  ji  .  o.  ,  l  •  e  . 


where  p  denotes  «n  estimator  or  p  (see,  e.g.  Gibbons  i. 4  j  a.'u 


deuiserle  and  Brantio  Co]). 
We  have  used  (6)  in  a. 


.Lion  to  the  loss 


(p-p)  '  '  .V  (  p  -  p ) 


which:  was  also  used  by  Dempster,  Sc hut 201 t,  and  We  ran:  t  h  13s. 
It  car.  be  argued  (as.  chey  did)  chat  (6)  would  be  a:,  asp rssn 
.oss  function  if  the  primary  goal  of  a  regression  study  is 
:s  time  tier*  or  t : . e  puri"..ccer:; ,  wnoreas  (7)  woula  ue  arnronria 
.£  the  regression  was  to  be  used  for  prediction  (since  (7) 


loss  fu 


if  the  regression  was  to  be  used  for  prediction  (since  (7) 
be  written  as  (X  p  -  .V  j  '  i i  s-.V  >,)  ,  and  £.  +  ^  is  preurecea  s\ 


-  ^  fi )  .  i  no  Loss  f  uric  L  ion  i  -  —  .»  z  J  *  \  i  —  i )  wouiui  oo  iz.  apprerri 
for  either  estimation  or  prediction  since  the  loss  is  minim:. 


v. non  i  is  tne  le^st  Sviuorcs  estimator ,  ns  wus  men. t  io:oc  in 
Section  2. 

-l f  we  uiopti  e i  t ner  (6)  or  (  /  )  or  no 1 1 /  wo  tner.  muse  uec 
what  /.  unci  3  to  u.;tw  as  well  as  the  method  i  or  ;;c;.v'L‘c.  l  imj  v\i  i. 
of  i.  Newhoure  and  Oman  L10]  showed  that/  assuming  a,  o'-,  a 

A 

k  uo  be  fixed/  MSS,'  2  *  (<) )  in  maximized/  for  3 '  3  -  i,  when  r  i 
the  normal  i zed  e  i  ijcr.voc  tor  correspond  l  m  to  t  ho  zma  .1 .:  os t  o i  a 


the  noriaLi 


e a  o  1  viol. v u o t  o r  c  o  r  r  e :  n  ■  t  >  n  a  l  m  to  1 1 <_•  bird:; *■  t  c  l  «. ; o 


use  these  two  choices  Cor  2  i n  their  simulat i on  stub! 

is  soiL'Ctuu  in  filler,  a  way  as  to  make  1  uii  eguicor 
matrix,  ana  observations  or.  the  dependent  variable  or 
generates  as 

y  =  -V  2  +  c 

where  a  is  .7  ‘  0  ,g~  )  .  Several  values  of  a"  are  then  us 
junction  with  each  of  several  X  matrices.  The  loss  i 
estimator  is  then  determined  using  either  (0)  or  (7), 
This  general  procedure  has  two  major  shortcoming 
we  would  not  expect  to  encounter  an  equicorrelet ion. 
actual  data.  docor.a,  although  the  two  choices  Cor  1 
to  the  sett  a  no  to  tno  worst  c ..  t  c  lor  ris  je  regi  ess  :  . 
(in  terms  of  mean  squared  error),  v.v  advise  against  .. 
smallest  eigenvalue  Cose.  The  reason  can  be  oiscerr.e 
Table  1.  In  particular,  wa-  should  notice  what  hap:  or. 
average  value  ot  h  'vtlie  couiiicittL  oi  mu  L t 1  p .  e  i  <. 
as  wo  move,  keeping  u“  fixed,  from,  the  we  1 1  -oonb  1 1 :  on 


in  the 

i.  1  IT  ii  V)u  r  t  O  i. 

v_ ;  .hj 

table,  to 

tl;e  !i lgi'.lv  ill 

ma t  r i x 

in  the  bottom 

pari 

of  the  ta 

L*  1 0  .  0 1 1 0  Culi 

in  Tabl 

O  i  )  t  1  »ci  t.  i\  ' 

1 1 1 

be  approxi 

mately  q/n  in 

e icon va 

J  UO  C  el  J  O  C  V  <J  i 

:.’o  r 

moaerate  o 

u  A  1  .V  is  hi 

coixi  l  t : 

or.Lei .  Lj  y. v7 1 .• 

is  the  war: 

L  v\i :  v>r  r: 

we  are  thus  unable  to  see  how  poorly  a  ridge  estimator  performs 
relative  to  least  squares  lor  reasonable  values  o:  K“ .  bir.ee 
progressive  ill-conditioning  tends  to  cause  R~  for  the  smallest 
eigenvalue  case  to  be  much  different  from  R“  for  the  largest 

n 

eigenvalue  case,  making  c“  smaller  so  as  to  produce  higher  R“ 
values  for  the  smallest  eigenvalue  case  would  tend  to  make  K~ 
almost  exactly  1.0  in  the  largest  eigenvalue  case. 

For  those  reasons,  we  have  usee  a  different  simulation 
procedure.  We  have  also  generated  observations  on  i  as  in  (o), 
but,  for  each  trial,  a  3  vector  is  genera  too  from  a  ur.ifor::. 
distribution  or.  the  collection  of  all  norm-one  q- element  vectors . 

A  normal (0, o“)  error  vector  a  is  then  generated  and  1  is 
ccmpu tea  as  i  —  a  a  .  Inis  tea  oo  J  ca u t  i o n s  against  t no  use 
of  random  3  in  simulation  studies  which  include  ai arena  classes 
of  estimators.  'We  see  nothing  wrong,  however ,  with,  using  i 
vectors  that  are  uniform  on  the  collection  of  ail  norm-one  vectors 
when  comparing  estimators  within  a  particular  cie.-;s  such  as 


ridge  esti 

mat or s . 

t\3  'will  so  seen 

1  ca  t.  o  r  / 

tne  mam 

O  U  J  xj 

Cti ves 

of  our  nun 

.ericul 

studies  have  been: 

(i) 

to  compar 

j  c  u  r 

r  luge 

estimator 

,c  much-ref orenced 

Hcerl , 

Kennurd , 

and 

i")  U.  X  k-i  i 

[5]  estimator,  and  (2)  to  determine  under  what  conditions  our 
ridge  constant  or  a  multiple  of  our  ridge  constant  would  be 
appropriate . 

In  addition  to  our  different  precounre  for  ?rui  ::*.g 
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We  hov 


ip.  a  diffcrer 


Instca 
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.no  ojuj 


aor:c  inn.cb  .V'A  matrices  m  a  aLli-jrer.i  i.ar.ner . 

.  i  o i  d  1. 1  o i .  upiH  Oden  /  wo  a -*> c c*  t.  n g  . . i * — .■  v-  . . w »»  t  v  ; »  i » .  i\ \  . 
[  13  ]  to  generate  several  progressively  ili-ccnd i tionoa  me trices. 

The  results  of  our  major  simulation  study  are  s hewn  ;r. 

Table  2.  The  following  can  be  discerned  £ ro:r.  inspection  or  the* 
table.  For  a  particular  degree  of  ill-cor.uitionir.g,  the  sine  or 
th. e  cos  ircc 


•rage  car..;! 


eeuer.es 


cr.v 


•  •  >U  e  c- 


;  urge  R“  implies  the  r.<_-od  for  a  small  ridge  constant, 
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constant . 
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We  agree  tli:U  exact  lun  ar  const  rami  '  am  "dis- 
cn'nftirt niji"  and  seemingly  univali-l ir,  as  well)  and, 
consequently,  prefer  constraints  nl'  tin-  I'urm  ri\i  <  h. 
nr  inure  generally  o'V'.i  <  hj-.  when-  T  is  a  positive 
definite  matrix  and  /.  is  a  positive  constant.  Tlu-n-  is  nn 
boundedness  assumption  nn  ,i',i  inherent  in  tin-  least 
squares  procedure,  and,  in  practice,  nin-  might  set  an  a 
priori  upper  bound  oil  reasonable  values  ot  ,j'J.  The  degree 
of  superiority  of  a  liiased  estimator  over  least  squares 
will  naturally  depend  on  how  sharp  a  hound  on  ,i\J  an 
experimenter  can  impose.  Suppose  that  based  on  prior 
information  only,  one  believes  that  a  :j'j  is  approxi¬ 
mately  equal  to  a  number  i  Kuks  and  t*lman  ,ll.>72,t 
•showed  that  if  /.*  >  a  'J,  then  the  ridge  regression 
estimator  wr  h  ridge  parameter  7,-  -  1  /'•*  will  outperform 
the  least  squares  estimator  in  terms  of  mean  squared 
error.  Consequently,  if  fairly  aeeurate  prior  inlonualion 
about  cr~':J‘J  is  available,  this  ridge  estimator  is  prefer¬ 
able  to  the  least  squares  estimator.  hacking  prior  ia- 
formation,  we  can  still  obtain,  by  estimating  Irotu 

sample  data,  an  estimator  t hut  tends  to  lie  better  than 
least  squares.  It  follows  from  this  line  ot  reasoning  that 
estimators  with  this  underlying  motivation  will  not  have 
"loose  theoretical  underpinnings." 

Smith  and  Campheii  note  that  ridge  regression 
estimators  are  not  invariant  under  model  t rau-fointa- 
tions  of  the  form 

)'  -  (A'.l ;  -t-  «  -  --  *  .  ilt 

1’eele  and  Ryan  t  l'.tT'J)  diseu.-s  minimax  linear  estimation 
based  on  prior  iniormatn.n  of  the  form  d  7’d  *-  ah, 
which  ineiuth  s  ordinal  -,  and  generalized  ridge  regression 
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are  invanani  under  model  1 1  an-n-i  mat  ions  ot  tie- 
loian  i  1  j. 

In  his  contribution  to  tin-  >i i -<  u— ion.  Van  \"-iinnd 
brielly  nieiil  toned  spceil.e  i  ei , lithe;  mg  i  re-ub-  from 
several  simnlaiioii  siinlie-.  li  i-  not  siirpri-ing  that  the 
results  can  ditfer  greatly  ir-nu  one  study  to  the  next 
since  no  "standard"  simulation  procedure  is  being  u-'-d 
In  point  ol  tact,  ii  seei 1 1-  that  t  iio-e  of  ijs  who  have 
performed  these  studies  hav<-  lallrn  <o  a  number  of 
pitfalls.  In  a  regression  simulation  study,  one  should  u«e 
appropriate  values  ol  the  error  variance  a:.  The  use  ot 
correlation  form  implies  that  one  should  choose  a:  e|o-.- 
to  zero  in  order  to  generate  data  with  n  a- /nabm  /.’■ 
values.  I'nfortunat-ly .  /."  i'  quite  -ensitiv  t . ,  >nial, 
changes  in  eoiiseipi.-nti'.',  great  care  mil'!  lie  exerei-.-,) 
in  order  to  avoid  producing  small  Ir  value-,  Ai-o.  the 
usual  procedure  of  letting  J  he  represented  by  tie-  eigen¬ 
vectors  corresponding  to  the  largest  and  smallest  eigen¬ 
values  of  A"  A  is  objectionable.  In  tin-  smallest  eigenvalue 
ease,  ly  will  be  approximately  ;/  n  when  the  .smallest 
eigenvalue  i-  quite  small  and  rr  is  at  luw  mu-  ord-T  of 
magnitude  larger.  Thus,  with  ;<  -  I  and  /<  100  a 

typical  choice i  we  i-oiild  hardly  -ay  that  w.-  have  gen-  r¬ 
ated  representative  rcgiv.—iun  data  when  H -  i-  approxi¬ 
mately  ill.  Making  a-  -mailer  in  an  etiori  to  i< -tin  •  1  \ 
the  probiem  would  only  iau--  to  be  aln-.o-i  exactly 
1.0  in  t lie  largest  eigenvalue  ea---.  There  i-  i-b  m  ix  a  need 
for  a  better  approach. 
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Comment 


Certain  jui--r  know  ledge  ahi-ut  intrinsic  measurement 
error-  can  b--  e|.-v.-riy  ita  .-rpornfed  tu  ridge  regression 
by  adding  j>  ti  utioiis  observations  to  the  data.  Con-ider 
a  standardized  regre.--:- >n  model  y  -  Ad  t  n.  where 
A" A  is  the  cm  r-  lat  ion  t.iatrix  ha-ed  on  n  observations 
among  /i  tegi's-ors  and  A  y  is  a  vector  of  correlation 
cuetiie tents  with  the  dependent  variable.  There  an- 


•  II  I  I  Vll.i.-!  i-  Sii|.rr\  i- ■  -t ,  I  relHiinu’ Stlltll.--,  I  VuIlnlMir  All.-llv-is 
Sects'll,  All.t-ttcaa  T.'lepli.ilt.-  .‘tin  I  lelegr:.;  ill  t’",  Pi-.-iitaw  :tv, 

NJ  nss- 1 


measurement  errors  m  each  regies.-oi.  »o  that  the  .'trail- 
able  da  t  a  are  ituli-i ingui.-habie  from  ot  la  r  dam  w  here  one 
may  add  a  number  between  —  ,d  to  .  Pd'.l  beyond  the 
last  publisl ied  digit  There  may  be  a  similar  error  in  each 
mean  (j\)  and  standard  deviation  (SI),'.  The  largest 
measurement  error  r,""'i  in  each  standardized  regressor 
t —  1)  bSD.r1.  where  il  -  I.  . m  is 


vj)  Journal  of  tho  American  Statistical  Association 
March  1950,  Volume  75,  Number  369 
Theory  and  Methods  Section 


